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The vulnerability of the face recognition system to spoofing attacks has 
piqued the biometric community's interest, motivating them to develop anti- 
spoofing techniques to secure it. Photo, video, or mask attacks can 
compromise face biometric systems (types of presentation attacks). Spoofing 
attacks are detected using liveness detection techniques, which determine 
whether the facial image presented at a biometric system is a live face or a 
fake version of it. We discuss the classification of face anti-spoofing 
techniques in this paper. Anti-spoofing techniques are divided into two 
categories: hardware and software methods. Hardware-based techniques are 
summarized briefly. A comprehensive study on _ software-based 
countermeasures for presentation attacks is discussed, which are further 
divided into static and dynamic methods. We cited a few publicly available 
presentation attack datasets and calculated a few metrics to demonstrate the 
value of anti-spoofing techniques. 
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1. INTRODUCTION 

Biometric recognition systems for identifying people have grown tremendously in large scale for 
population census, controlling workspace access, applying access control to sensitive information, in 
forensics to identify criminals, while performing online transactions, law enforcement applications, and so 
on, for either user identification or verification. Biometric recognition is an automated process to authenticate 
and/or identify any individual using his or her physiological or behavioral traits. Examples of physiological 
characteristics are fingerprints, palmprints, iris, face, and deoxyribonucleic acid (DNA), and a few examples 
of behavioral characteristics are gait, voice, handwritten signatures, and key strokes. Biometric identification 
has various merits compared to the traditional identification system, which uses a smart card and password, 
as biometrics is a person’s key that can never be lost or forgotten. Biometrics is always attached to the user, 
and it is not easy to share or forge [1]. 

As we know now, biometrics utilize a user’s unique biological trait for identity verification. This 
type of authentication falls under possession-based authentication, which relies on a secret that only you 
have. The other type relies on a secret that only you know, called knowledge-based authentication. Such a 
biometric authentication system is vulnerable to spoofing attacks, where a fraudster makes an effort to 
compromise it. The type of spoofing attacks will change based on the different types of biometric modality, 
whether the biometric technique uses an iris, a fingerprint, a face, a keystroke, or a voice. Few traits cannot 
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be easily compared to others. And hence, there is a requirement for specifically designed algorithms to 
identify the spoofs. 

Each biometric has its own merits and flaws. For example, a fingerprint is most commonly used for 
commercial purposes to provide evidence of his or her presence; however, providing the fingerprint 
impression requires strong user cooperation. Iris is extremely precise, but it is dependent on image quality 
and requires users to actively participate in the scanning process. Face recognition is favorable in terms of 
availability and reliability. In this biometric system, a biometric user’s face can be captured without 
their knowledge or consent, meaning they need not cooperate and recognition can be done from longer 
distances [2]. 

Fraudsters take advantage of the vulnerabilities of a secure biometric system. In a spoofing attack, 
an individual attempts to masquerade as another person to get through a secured biometric recognition 
system. So, there is a significant requirement for anti-spoofing techniques to secure the biometric recognition 
systems. We need preventive measures to defend against unauthorized replicas of biometric traits. 

Amongst all the biometric systems that are put up for commercial purposes, face biometrics play a 
pivotal role since they are widely used in national border control, physical or logical access control, 
forensics, e-commerce, surveillance, and e-governance domains. In a science fiction film, a young man 
disguised himself as an elderly person to board a plane to Canada, using a silicon mask on his face to fool 
border control agents [3]. since facial images can be captured in a non-intrusive manner using low-cost 
sensors [4]. Hence, spoofing as an authorized individual using their information is the biggest threat to 
biometric systems. Ramachandra and Busch [4] reports a black hat test that reveals the spoofing process in 
face recognition in laptops from various vendors. These cases demonstrate the loopholes of a real-time face 
recognition system. The attackers are highly motivated, as they can easily deceive the system by cost- 
effectively creating the face artifacts. Various video tutorials are now available on the web that provide 
information on how to create face artifacts [5]. And hence spoofing (including anti-spoofing) is the most 
urgent problem for researchers to address in the face recognition domain. 


2. VULNERABILITIES OF FACE BIOMETRIC SYSTEM 

Face recognition in the biometric system consists of two stages: enrollment and authentication. 
During enrollment, an unknown user's biometric information is recorded and saved as a template in the 
database of the recognition system. Generally, a biometric recognition system can be utilized for either 
verification or identification purposes. A biometric verification system does a one-to-one comparison to 
verify the user's identity. A biometric identification system identifies a user by comparing his biometric 
template with that of all the other users stored in a database. So biometric identification does a one-to-many 
comparison to identify a user from among many users who have enrolled their templates [6]. 

The basic architecture of the face recognition system consists of a sensor module that captures the 
face image, a feature extraction module that extracts facial features, and a matcher module that matches the 
input face with already stored templates for authentication. Ramachandra and Busch [4], Ratha et al. [7] have 
identified eight loopholes in a generic biometric system, which are depicted in Figure 1. 
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Figure 1. Vulnerabilities of biometric recognition systems 
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These vulnerabilities are categorized into four groups according to Jain et al. [8]: 

a. User interface attack: while the sensor is capturing the biometric information attacker presents fake 
properties such as photos, videos, and masks. 

b. Channel attack: the channels between the modules are intruded by replaying the old data, artificially 
synthesizing a feature vector, intercepting the channels or overriding the final decision. 

c. Module attacks: modules are attacked by overriding the feature extractor and the matcher modules, their 
behavior can be modified. 

d. Template database attacks: fraudster modifies a valid user’s template in the database by replacing it 
with his biometrics. 


3. FACE SPOOFING METHOD 

Spoofing is an attempt to intrude into the normal workings of the biometric system by introducing 
artificial replicas of biometric traits in an unauthorized manner. A fraudster performs spoofing attacks to 
mimic a valid individual to bring down an entire biometric recognition system. To deceive a facial biometric 
system, they use the following spoofing methods: i) Images and masks: a fake user produces an authorized 
user’s photograph or wears a mask created using prosthetics to imitate an individual; ii) Identity 
manipulation: the facial image stored by the biometric system is manipulated to match the facial 
characteristics of an unauthorized user using face morphing software, also biometric technological systems 
can be tricked either by changing the stored image or by using special makeup; and iii) Identical twins: 

Identical twins can act as spoofs for each other. The methods are used to attack the face recognition system 

by a fraudster. The face spoof attacks can be classified as given Figure 2 [9]. 

a. Printed flat photo attack: the commonest attack is the usage of flat printed photos against the 
recognition system without the permission of the original user (Refer to Figure 2(b)). 

b. Cut photo attack: eyes in the printed copy of the photo is cut off to display blink behavior to get over 
with the challenge-response antispoofing technique (IDLive Face) (Refer to Figure 2(c)). 

c. Warped photo attack: printed photo is bent in a certain way to simulate the facial movements (Refer 
to Figure 2(d)). 

d. Video replay attack: a video is played before the sensor device of a face recognition system using a 
laptop or a mobile device. In this type of attack, some of the movements like the blinking of eyes, facial 
expressions, head movements can be performed using phones, tablets or laptops (Refer to Figure 2(e)). 

e. Mask attack: two types of masks are used by any impostor, they are wearable masks and paper cut 
masks. It is the most difficult type of attack. Mask manufacturing is expensive and it requires three- 

| 


dimensional (3D) scanning and printing devices too (Refer to Figures 2(f) and (g)). 
(d) (e) (g) 
Figure 2. Types of presenation attacks (a) valid user, (b) printed flat photo, (c) eye cut photo, 


(a) (b) (c) 
(d) warped photo, (e) video playback, (f) synthetic mask, and (g) paper cut mask [9] 


4. FACE SPOOF DETECTION 

Face spoof detection, liveness detection of a face, countermeasures against facial spoof attacks and 
face anti-spoofing are used interchangeably to refer to the methods to identify a fraudster trying to gain 
access into facial recognition systems by posing him/herself as a genuine user. Liveness detection in a 
spoofing problem marks the contrast between fake and real faces and hence it is a face recognition problem. 
The difference is that the face recognition techniques try to look for a face that increases interpersonal 
variations (differences between two persons) but anti-spoofing techniques strive for a face representation that 
reduces interpersonal variations thereby increasing intrapersonal variations due to changes in illumination 
and poses [10]. 


Antispoofing in face biometrics: a comprehensive study on software-based techniques (Vinutha H) 


4 0 ISSN: 2722-3221 


Liveness detection can be defined is the ability of the biometric system to recognize a living, 
authorized individual. Here biometric authentication involves verifying that the user who has initially 
enrolled, is the same person who is appearing for authentication, not a 2D photograph, or digital version of 
the face [11]. This can be achieved through algorithms that examine the input data collected from biometric 
sensors. 

Liveness detection can be categorized as active and passive: Active method asks the person to 
perform an action that cannot be easily reproduced. That is, asking the user to blink his/her eyes, and raise the 
eyebrows. These are also called as challenge-response techniques [12]. BioID offers a challenge-response 
technique to identify a spoof attempt. Here the BioID system challenges the user by giving some random 
commands, and the response is validated. The passive method applies techniques to detect a non-live image 
without user interference. For this purpose, the biometric data captured during the enrollment stage is used. 
IDLive [12] a passive facial liveness detection, happens at the backend, which recognizes the spoof attempt 
without requiring user interaction. 


5. CLASSIFICATION OF FACE ANTI-SPOOFING METHODS 

Antispoofing methods [13] are categorized into two main groups. They are designated as hardware 
and software based techniques [2]. These are the most deployed methods of anti-spoofing which are further 
grouped into various other methods as shown in Figure 3. 
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Figure 3. Classification of face anti-spoofing methods 


6. HARDWARE-BASED TECHNIQUES 

Hardware-based anti-spoofing techniques incorporate special hardware devices in the biometric 
system to detect fake access. In the literature, authors have used different types of hardware devices in their 
face biometric set up. These devices utilize different types of imaging technologies such as near-infrared 
(NIR) images [14], [15], 3D depth images [16], or complementary infrared (CIR). These are captured by 
comparing the reflectance information of real authenticated faces and the spoof equipments, using light 
emitting diode (LEDs) and photodiodes set-up at various wavelengths. Thermal imaging has been explored, 
by acquiring large datasets of thermal images of face using infrared cameras, for real as well as for the spoof 
attempts in liveness detection [17]. Erdogmus and Marcel [18] used depth information of 2D photographs to 
discover a 2D attack. Whereas Wang et al. [16] recovers a 3D facial information to determine the spoofing 
attacks using specialized depth cameras. Ng and Chia [19] made use of facial expressions that randomly 
changed the temporal information thereby verifying the liveness of users.[15] used a near-infrared band for 
disguise detection. 

Subsidiary sophisticated hardware increases the overall cost of the system and also most of the 
mobile phone cameras and webcams available today are not compatible with it. So, this has motivated the 
anti-spoofing research fraternity to shift to software-based anti-spoofing techniques which are cheaper and 
they can be easily installed to the existing face biometric systems. Table 1 gives a brief insight into the 
hardware-based methods utilized by the researches for spoof detection through the years. 
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Table 1. Hardware based anti-spoofing techniques 


Contributors Method/Equipment 
Pavlidis and Symosek (2000) Near infrared bands 
Ng and Chia [19] Random temporal cues 
Dhamecha et al. [17] Thermal imaging using infrared cameras 
Ergdogmus and Marcel [18] 2D depth information 
Wang et al. [16] 3Dfacial depth information using depth cameras 
Z. Zhang et al. [14] Multispectral reflectance distributions 
Albakri and Alghowinem [20] 3D depth cameras 


7. SOFTWARE-BASED TECHNIQUES 

Software-based anti-spoofing techniques make use of an algorithm that detects and categorizes the 
captured face into either a spoofed face, as a result of any spoofing attacks or a genuine face. These sorts of 
techniques incur lesser cost, easy to incorporate a piece of code running inside the face recognition system 
and exhibit higher accuracy. Major advantage is it does not require specialized and higher cost hardware 
equipment as in hardware-based anti-spoofing techniques and also doesn’t user need not co-operate, it can 
work without the knowledge of the user accessing the face recognition system. 

Software-based methods are further categorized into static and dynamic approaches. Static 
approaches execute on only spatial information without requiring temporal data. Here a single image is 
considered. If a video is presented and static approach is used then each frame of a video sequence is 
considered for further processing. A static approach incurs less cost but yields good performance. Dynamic 
approaches utilize spatio-temporal information of the video played before the face recognition system for 
access control. This approach tends to find the relative motion of the video frames that are run against the 
face sensor and hence it requires more time and effort compared to static approaches. State-of-the-art static 
approaches are grouped into 3 different groups based on the nature of the algorithms used. They are texture 
based, frequency-based and hybrid approaches (Refer to Table 2). 

Texture based approaches analyze the microtextures of the facial image representation. This is the 
most popular approach in determining display and photo attacks because this delineates amongst the 
formation of pigments (formed while printing), specular reflection (due to quality variation) and a shade 
(arising due to display attack). Maatta et al. [21] first came up with a widely used texture-based approach 
based on local binary patterns (LBPs) and detected photo print attack. Edmunds [10] propose software-based 
protection methods based on the LBP descriptor to evaluate texture-based anti-spoofing techniques, by 
focusing on differences between natural and unnatural characteristics present in the face region. Here LBP 
operator embeds color and contrast information in texture characterization and they combine the images with 
HSI color model to form a HSI-LBP color texture descriptor which improves texture-based anti-spoofing 
techniques on Replay Attack for CASIA and MSU databases. Chingovska et al. [22] also used the same LBP 
descriptors to solve the replay video attack on the face recognition system. LBP captures the pigment 
information on the image which is formed out of printers and also LBP captures the change in reflectance due 
to the quality variation of the attack equipment. 

Frequency-based approaches analyze and quantify the frequency component. Initial work was 
based on the fourier spectrum analysis to successfully detect a photo attack for face biometric carried out by 
Li et al. [23]. Liu et al. [24] also used fourier spectra to detect video replay attacks. There are different works 
carried out by researchers like [24] those used discrete cosine transforms, Zhang et al. [14] made use of 
difference of Gaussian (DoG) filters, and Peng and Chan [25] used components which are of high frequency. 

Hybrid approaches combine more than one property or attribute associated with the spoofed 
image. Many researchers have tried out their hands in combining various attributes to achieve better 
recognition between fake and real faces, for example, Raghavendra and Busch [26] combined texture 
attribute with time-frequency information, Maatta et al. [21] fused texture and shape, Komulainen et al. [27] 
incorporated all the context information. Hasan et al. [28] fused modified DoG filtering and binary pattern 
variance to identify photo spoofing, by analyzing both texture LBP and contrast (variance) characteristics. 
Support vector machine (SVM) was used on the extracted feature vectors and they also prove that LBP- 
variants (LBPV) with SVM classifier method gives better results compared to SVM and LBP in presentation 
attack detection. Benlamoudi et al. [29] used the Viola-Jones face detection algorithm [30] and pictorial 
structure model [31] to detect the face and then localize the eye positions; then, the coordinates of the eyes 
are used to make right the posing of the face. In the multi block (MB) technique [32], face is divided into 
square blocks and a texture descriptor is applied on each block. Three popular texture descriptors, binarized 
statistical image features (BSIF), LBP, and local phase quantization (LPQ) are used in multiple levels, 
forming a multi-level feature descriptors (FD-ML) for feature extraction. ML representation is formed by 
combining multiple MBs [32]. Finally, Lib-SVM [33] was used to classify the feature vectors as valid or 
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invalid. Table 2 consolidates on software-based static countermeasures by quoting their methods adopted, 
types of attacks, datasets and performance metrics used. 


Table 2. Software based static anti-spoofing methods 


Contributors Types of static method Attack type Database EER HTER 
method (%) (%) 
Edmunds [10] Texture HSI-LBP color Print, mobile Replay-Attack - 44.1 
texture descriptor and ipad Casia 30 - 
MSU android 30 - 
MSU laptop 40 - 
Maiatta et al. [21] Texture LBP Photo attack NUAA 2.8 - 
Photograph 
Imposter 
database 
Chingovska et al. [22] Texture LBP Replay video Replay-attack - 15 
attack 
Li et al. [23] Frequency Fourier spectrum Photo attack NUAA 4.71 1.5 
analysis Photograph 
Imposter 
database 
Zhang et al. [14] Frequency DoG filters - - - - 
Liu et al. [24] Frequency Fourier spectrum Replay video Replay-Attack - - 
analysis attack 
Raghavendra and Busch [26] Texture Local features-eye 3D mask 3DMAD - 0.03 
(periocular) and attacks 
nose region, Global 
features - BSIF and 
SVM-classification 
Hasan et al. [28] Texture LBPV pattern Photo attack NUAA - 0.39 
representation and Photograph 
SVM Imposter 
database 
Benlamoudi et al. [29] Representation LBP, LPQ, BSIF- Warped photo CASIA-FAS 17.46 - 
+ texture descriptors attack MSU-MFS 14.9 - 
Fisher core - sort the — cut photo Replay-Attack - 12.25 
features in attack database 
ascending order Video attack 
SVM-classification 
Li et al. [34] Spatio temporal 3D convolutional CASIA 1.4 - 
neural network MSU - 
(CNN) 
George and Marcel [35] Hybrid Fully connected Replay attack Replay mobile - 0 
neural network- dataset 
CNN 
Sharifi [36] Texture Fuses CNN and Print attacks OVLBP 
OVLBP Video attacks Print Attack - 14.35 
Replay Attack - 15.75 
OVLBP+CNN 
Print Attack 
Replay Attack - 10.40 
- 11.00 
Chen et al. [37] Texture Retinex based LBP Replay attacks Replay Attack 0.093 0.206 
and region based 
CNN ROI pooling 
Pujol (2020) [32] Texture Entropy based Presentation CASIA FASD 9.5 - 
HOG-EBHOG attacks MIFS 


State-of-the-art dynamic approaches are of three kinds: motion-based, texture based and hybrid 
schemes (Refer to Table 3). Motion-based approaches employ a structure from facial movements, which in 
turn yield depth information for features of face. Kollreider et al. [38] has proposed a novel liveness 
awareness framework by estimating face motion for face authentication which utilizes lightweight optical 
flow and gives a liveness score. A generalized dynamic approach for spoof detection was proposed by Li et 
al. [39] by utilizing pulses extracted from videos, referring to the fact that only a live face can have pulses 
(motion) in it but not the printed photos or mask. Edmunds and Caplier [40] exploit conditional local neural 
fields track face’s motions and bag-of-words feature extraction method extracts rigid and non-rigid motion 
features using the fisher vector. 
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Texture based approaches exploit the characteristic of face representations that discriminate a live 
face from fake ones. Research community started with relying on LBP patterns and its variants only to fuse 
different characteristics to form hybrid schemes which prove to be more efficient comparatively. Pereira et 
al. [41] extended regular LBP operator to VLB (volume local binary pattern) operator, the so-called 
spatiotemporal characteristics (dynamic texture), to detect the dynamics of face micro-textures. Liu et al. [42] 
has developed an remote photoplethysmography (rPPG) correlation model which extracts local heartbeat 
signal patterns that discriminate against a mask and a live face. A confidence map was drawn using signal 
strength that exploits the characteristics of rPPG distribution on real faces. Buolkenafet et al. [43] has 
considered facial appearance to detect the liveness. They extract features from color spaces and the fisher 
vector encoding method is applied to these features for liveness detection. They have tested on the following 
benchmark databases: Casia, Replay-attack, MSU-MFSD and claim that their method outperforms the best 


methods available. 


Table 3. Software based dynamic anti-spoofing methods 


Contributor Types of dynamic method Attack type Database EER HTER 
method (%) (%) 
Kollreider et al. [38] Motion Optical flow Playback attack Replay Attack 0.5 - 
database and 
proprietary 
T. de Freitas et al. [41] Motion Spatio-temporal Print attacks CASIA 10 - 
extension of LBP and Video attacks Replay-Attack 7.60 
Litong Feng et al. [44] Texure and Dense optical flow Print attacks Replay Attack 0 0 
motion hierarchical neural Video attacks and 3D MAD 
network Mask attack CASIA FASD 5.83 - 
Siqi Liu et al. [42] Heart beat signal Analyses heart beat 3D mask attack Public and self- - - 
signal through rPPG contained datasets 
Xiaobai Li et al [39] motion Pulse detection and Print attacks 3DMAD and - - 
color texture analysis Video attacks REALF masks 
Mask attacks 
Z Boulkenafet et al. Color spaces Fisher vector encoding _ Print attacks CASIA, Replay - - 
[43] on the extracted Video attacks Attack, MSU- 
features of different MFSD 
color spaces 
Junying Gan et al. [45] Video frames 3D CNN Video attacks Replay Attack - 0.04 
CASIA - 10.65 
Yaojie Liu et al. [46] Depth and signals © CNN-RNN model for Presentation Proprietary - - 
face depth estimation attacks database 
and rPPG signals 
estimation 
Taiamiti Edmunds and Motion Conditional local Presentation CASIA FASD, - - 
Alice Caplier [40] neural fields face attacks Replay-Attack, 
tracking MSU-MEFSD, 
3DMAD 
Lei Li et al. [47] Texture Deep LBP print attacks Replay-Attack, - - 
and video attacks CASIA 
Emna Fourati et al. [48] | Motion Image quality Presentation Replay-Attack - 0.024 
assessment attacks 
H. Chen et al. [37] Motion R-CNN and Improved __ Video attacks Replay-Attack 0.093 0.26 
retinex LBP 
Xin Cheng et al. [49] Texture and Dynamic and texture Replay and video CASIA-MFSD 6.9 - 
motion fusion attention attacks Replay Attack - 2.2 


network 


Gan et al. [45] have exploited spatio-temporal features of video frames using 3D CNN which have 
scored half total error rate (HTER) (refer to metrics for face anti-spoofing system evaluation section) 
values of 0.04% and 10.65% for Replay Attack and CASIA databases respectively. Feng et al. [44] proposed 
a hierarchical neural network model for antispoofing by integrating image quality and motion cues and 
achieving 0% HTER and equal error rate (EER) for both Replay Attack and 3DMAD datasets. In the case of 
CASIA FASD, the framework has achieved an EER 5.83%. Other than these, Pan et al. [50] utilizes 
conditional random fields (CRF) that will be generated whenever an eye blink occurs in the face for liveness 
detection. CRF works on context-based phenomena of temporal data. 

The earlier works carried out in the literature using static or dynamic approaches make use of 
handcrafted features like LBP, and scale invariant feature transform (SIFT), to discriminate between the real 
and fake faces. CNN started gaining popularity, during which it was first used by Yang et al. [51] for face 
anti-spoofing. The following are a few instances where CNN marked its presence assertively. 
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Nikitin et al. [52] fused two deep classifiers, first being used in identifying the presence of spoofing 
medium and the second classifier is used to analyze the blinking of eyes and checks eyes openness 
classification per frame. Muhammad ef al. [53] incorporated a novel, sample learning-based recurrent neural 
network (SLRNN) anti-spoofing architecture which makes use of the following 3 models: CNN, sparse 
filtering and long short-term memory (LSTM). Sparse filtering was applied for augmenting the features using 
residual networks (ResNet). The augmented features formed a sequence and were fed into a LSTM network 
to construct the final representation. A 3D CNN framework was proposed by Li et al. [34] which takes both 
spatial and temporal information, using the data augmentation method and employs a generalization 
regularization thereby improving generalization performance. George and Marcel [35] proposes a dense fully 
connected neural network architecture, trained with pixel-wise binary supervision. Here a single CNN model 
which uses frame level information without requiring temporal data for detecting the presentation attack with 
deep pixel-wise supervision. It has achieved HTER of 0% in the Replay Mobile dataset and an ACER of 
0.42% in Protocol-1 of OULU dataset. A CNN-RNN (recurrent neural network) combination model was 
employed by Liu ef al. [46] for face depth estimation and rPPG signals estimation using pixel-wise and 
sequence—wise supervision respectively. And depth information and rPPG signals are fused. Li and Feng [47] 
used SVM to classify between real and fake by extracting handcrafted deep partial features from the 
convolutional responses. Two sets of feature information extracted out of the CNN model and OVLBP 
(overlapped histograms of local binary patterns) are fused by Sharifi [36] to form a score vector. A majority 
voting of CNN, OVLBP, and fused score helps in fake detection. Table 3 summarizes the Software based 
dynamic anti-spoofing techniques contributed by various researchers. 


8. METRICS FOR FACE ANTI-SPOOFING SYSTEM EVALUATION 

A spoof detection system can incur two types of errors [9]: i) number of false acceptance (NFA) -it 
gives the count of fraudsters accepted as authorized users (i.e.), and the probability of its occurrence is known 
as false acceptance rate (FAR); ii) number of false rejection (NFR) - this is the count of authorized users, 
who can be considered as fraudsters, and its probability of occurrence is called a false rejection rate (FRR) 
(Refer to Table 4 for the metrices). FAR and FRR are inversely proportional to each other. A receiver 
operating characteristics (ROC) curve is drawn by computing all possible pairs of FAR and FRR values, as 
illustrated in Figure 4. 


EER 
= 


AUC 


1-FRR 


FAR 


Figure 4. Relationship amongst the metrics on the ROC curve 


The area under curve (AUC) metric is obtained by the integral of a ROC, the grey area in Figure 4. 
EER is the point of interference when FAR equals FRR on the ROC curve, and HTER is the point on the 
ROC curve where the average of FAR and FRR is minimum. Finally, for overall accuracy (ACC) both 
authorized users and fraudsters are considered. A variant of ROC called DET (decision error tradeoff) is also 
used in some cases for showing verification performances. The only difference between ROC and DET is 
that the primary difference is that the y-axis takes a false rejection rate instead of a true acceptance rate in the 
DET curve. 

For the biometric authentication in Android smartphone, Google recognizes two types of attacks: 
“impostor” attacks and “spoof” attacks. A fraudster pretends to be an authorized user by disguising his or her 
features in an impostor attack, but in a spoof attack, a non-live representation of the authorized user such as a 
photograph or video is used to gain entry. Google sets a threshold of 7% accept rate or less for strong security 
during attack detection that is the percentage of times an attack is not detected [11]. This is analogous to a 
biometric “false accept rate”, which represents the likelihood that a person is incorrectly identified as a 
biometric match. Figure 5 shows the plot of error metrics plotted for different static methods. The X-axis is 
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plotted with the static method (one of texture, frequency, and hybrid) along with the type of attack to which 
this method is applied and Y-axis measures the EER and HTER values for each method if available. 


Table 4. Metrics commonly applied on face spoofing evaluation 


Metric Stand for Equation Type 
FAR False Acceptance Rate Error 
FAR == 4 
#Impostor 
FRR False Rejection Rate Error 
FRR = NER 
#Genuine 
EER Equal Error Rate Error 
EER = (FAR= FRR) 
HTER Half Total Error Rate Error 
HTER = FAR FRR 
ACC Accuracy Hit 


FAR X #Impostor+FRR X # Genuine 
#Impostor+ #Impostor 


100 X (1- 


AUC Area Under Curve Area = i f (x) dx,where f : [a,b] > R Hit 
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Figure 5. EER-HTER error evaluation for software based static methods 


Based on the statistics in the graph it seems that out of all the texture components LBPV offers 
better spoof recognition with the least error of 0.039 for the photo attacks. For the video attacks, the hybrid 
approach of 3D CNN has the highest recognition rate of 100% that is 0 error. Raghavendra and Busch [26] 
has applied texture descriptors for 3D mask attacks. Until now it is known to have the lowest error recorded 
using texture descriptors. But it is also proved that the same video replay attacks have 0 HTER when CNN 
are used for recognition. It is observed that hybrid approaches have higher HTER and EER values i.e. when 
multiple descriptors are applied. So, there is a scope of research in this domain in which two descriptors from 
texture and frequency-based methods will yield better spoof recognition and lesser error rates. 

The error metrics for dynamic methods are plotted in Figure 6 where X-axis depicting one of the 
three dynamic descriptors (motion, texture, and hybrid) with the attack type. Y-axis carries the HTER and 
ERR values. Dense optical flow is best suited for identifying the print attacks as it is proven that its HTER 
and EER values are 0. But if we use spatio-temporal descriptor (hybrid) or a 3D CNN for print attacks, for 
video frames the captured error rate is 0.04, it incurs error with HTER of around 10, which is not acceptable 
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in anti-spoofing algorithms. 3D CNNs find their use here, that is when they are used. Hence in the literature, 
the application of texture descriptors over the face image has proven to be a good approach in static methods 
and dense optical flow works very well for both print and video attacks in dynamic methods. But deep 
learning is manifesting the face recognition domain too and CNNs, deep neural networks are being 
extensively used in discriminating amongst the original, valid faces and the spoofed faces. 
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Figure 6. EER- HTER error evaluation for software based dynamic methods 


9. FACE ANTI-SPOOFING DATASETS 

Face spoof attack databases that are available, play a pivotal role in computing new and better face 
anti-spoofing techniques. Here few face publicly available databases are discussed. Researchers around the 
world use extensively, the seven publicly available benchmark data sets to evaluate their anti-spoofing 
algorithms. NUAA Photograph Imposter [55], replay-attack [22], print-attack [55], CASIA face anti- 
spoofing[56], MSU-MFSD [57] are the well-known datasets concerned with 2D attacks. For mask attacks, 
the 3D mask attack dataset [18], [58] dataset are used. 

The first and foremost public dataset that was made available for face anti-spoofing was the NUAA 
Photograph Imposter dataset [55]. Images here were captured by regular webcams in different environments 
under varied illumination conditions, in three sessions, between each of which with an interval of two weeks 
were taken. Printed flat and warped attacks are evaluated. 

The Print-Attack dataset [55] was the standard dataset used in the first competition for spoof 
detection. The images were captured by presenting a flat printed photo of an authorized person to the system 
by hand-held method (i.e., the fraudster holds the photo in the hands) or using a fixed support (i.e., photos are 
stuck on a fixed stand or wall). Print-Attack dataset was extended to the Replay-Attack dataset [22] for 
evaluating video and photo attacks. It comprises of 1,300 video clips of video and photo attacks. Here trio 
attacks modes were considered: i) printed photo and video playbacks, 11) using a low-resolution mobile phone 
and iii) an iPod screen. CASIA face anti-spoofing data set [56] has seven situations with different image 
qualities and various attack types. This data set presents warped photo attacks, video playback attacks, and 
eye-cut photo attacks. 

The first public database for mask attacks is 3D mask-attack dataset (3DMAD) [18] and it 
comprises an RGB-D camera recorded video sequences. ThatsMyFace13 manufactured the masks using the 
frontal and profile images of an individual. MSU mobile face spoofing dataset (MSU MFSD) [57] has 280 
video clips of video and print photo attacks. A color printer printed all the photos of 35 participants used for 
attacks on a large-sized paper. Each participant’s video playback was taken to perform an attack. MORPHO 
Company created [58] dataset, which is a paid-mask dataset, using a 3D scanner. To obtain authorized 
images of face shape and texture, it uses a structured light technology. Then Sculpteo 3D Printing technology 
manufactures the masks, and are recaptured by the same sensor to get the fraudster images. Face spoof attack 
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databases that are available, play a pivotal role in computing new and better face anti-spoofing techniques. 
Here few face publicly available databases are discussed. Researchers around the world use extensively, the 
seven publicly available benchmark data sets to evaluate their anti-spoofing algorithms. NUAA Photograph 
Imposter [55], replay-attack [22], print-attack [55], CASIA face anti-spoofing [56], MSU-MFSD [57] are the 
well-known datasets concerned with 2D attacks. For mask attacks, the 3D mask attack dataset [18], Kose and 
Dugelay’s [58] data set are used. 

The first and foremost public dataset that was made available for face anti-spoofing was the NUAA 
Photograph Imposter dataset [55]. Images here were captured by regular webcams in different environments 
under varied illumination conditions, in three sessions, between each of which with an interval of two weeks 
were taken. Printed flat and warped attacks are evaluated. The print-attack dataset [55] was the standard 
dataset used in the first competition for spoof detection. The images were captured by presenting a flat 
printed photo of an authorized person to the system by hand-held method (i.e., the fraudster holds the photo 
in the hands) or using a fixed support (i.e., photos are stuck on a fixed stand or wall). Print-attack dataset was 
extended to the replay-attack dataset [22] for evaluating video and photo attacks. It comprises of 1,300 video 
clips of video and photo attacks. Here trio attacks modes were considered: i) printed photo and video 
playbacks, ii) using a low-resolution mobile phone and iii) an iPod screen. CASIA face anti-spoofing dataset 
[56] has seven situations with different image qualities and various attack types. This data set presents 
warped photo attacks, video playback attacks, and eye-cut photo attacks. 

The first public database for mask attacks is 3D mask-attack dataset (3DMAD) [18] and it 
comprises an RGB-D camera recorded video sequences. ThatsMyFace13 manufactured the masks using the 
frontal and profile images of an individual. MSU mobile face spoofing dataset [57] has 280 video clips of 
video and print photo attacks. A color printer printed all the photos of 35 participants used for attacks on a 
large-sized paper. Each participant’s video playback was taken to perform an attack. MORPHO Company 
created Kose and Dugelay’s [58] dataset, which is a paid-mask dataset, using a 3D scanner. To obtain 
authorized images of face shape and texture, it uses a structured light technology. Then Sculpteo 3D Printing 
technology manufactures the masks, and are recaptured by the same sensor to get the fraudster images. 


10. CONCLUSION AND FUTURE DIRECTIONS 

The biometric system particularly the Face recognition system faces many threats and challenges 
from the fraudsters due to its vulnerabilities. These challenges were effectively addressed by the researchers 
from the past by applying different types of Anti-spoofing techniques. These techniques make use of either 
hardware or software-based solutions. In our review process, we gave an overview of the state-of-the-art of 
both static and dynamic methods followed in Software-based anti-spoofing techniques. Few performance 
metrics like FAR, FRR, ROC, HTER, EER, ACC which are used in evaluating face anti-spoofing techniques 
are discussed as well. We tried to consolidate on few publicly available benchmark databases for 2D 
presentation attacks, that are available upon which anti-spoofing technique can be applied. A lot of research 
work has been carried out on the databases where they knew the type of presentation attack and 
countermeasures for known attacks are devised. But in the real world, we cannot integrate anti-spoofing 
algorithm for a single particular known type of attack. Fraudsters can attack the same system in different 
ways either using a mask or by presenting a printed photo or he can run a video replay instead of an 
authorized user’s face, here type of attacks is not known. Though there are few works on unknown attacks in 
which partial paper cuts and transparent masks are used, much more focus is required in this area. Potential 
face recognition systems for unknown attacks should be designed to detect the type of attack (also called 
zero-shot face anti-spoofing) and mitigate it. 
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